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Abstract 



The usual Euclidean distance may be generalized to extended objects such as polymers or membranes. 
Here this distance is used for the first time as a cost function to align structures. We examined the 
alignment of extended strands to idealized beta-hairpins of various sizes using several cost functions, 
including RMSD, MRSD, and the minimal distance. We find that using minimal distance as a cost 
function typically results in an aligned structure which is globally different than that given by an RMSD- 
based alignment. 
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Introduction 



In a series of experiments starting in the late 1950s and culminating in a 1961 paper in the Proceedings 
of the National Academy of Sciences (0), C. B. Anfinsen showed that a protein such as bovine pancreatic 
ribonuclease would, under oxidizing conditions, undergo slow but spontaneous reshuffling of disulfide 
bonds from a state with initially random cross-linked pairs, to a state with correct disulfide pairing and 
full enzymatic activity. The spontaneous formation of correct disulfide pairs indicated that the amino 
acid sequence itself was guiding the process towards more thermodynamically favorable configurations, 
and the so-called thermodynamic hypothesis in protein folding was born. 

This discovery underpinned the formalism developed decades later to understand protein folding as 
a configurational diffusion process on an ener gy landscape that through molecular evolution had the 
overall topography of a rugged funnel 



The initial random crosslinkings 



43, UJ? ? 



and subsequent slow exchange of disulfide bonds observed by Anfinsen and colleagues argued against a 
mechanistic pathway picture, but there was nevertheless a lag phase before the energy landscape picture 
eventually took hold. 

Though important as a conceptual tool, real predictive power was brought to bear by quantifying the 
funnel notion to generate free energy surfaces as a function of a progress coordinate that measured the 
degree to which a protein was folded @,|H? ). Soon thereafter, questions arose regarding what coordi- 
nate^) best represented folding progress, or whether one could even find a simple geometric coordinate 
that would represent kinetically how folded a protein was. The kinetic proximity of a given configuration 
was quantified unambiguously as the probability a protein would fold first before unfolding, given that it 
was initially in that given configuration (flij). This idea had earlier analogues in the Brownian analysis 
of escape and recombination probabilities of an ionized electron (j37l ). 



Order parameters in protein folding 

The study of various order parameters that mi ght best represent pro gress in the folding reaction have 
generated much interest (Q, 0, 0, E3, 03, EE E3, Si Gl, 0, S 0,13? ? ), with questions focusing 
on what parameter(s) or principle component-like motions might best correlate with splitting probability 
or probability of folding before unfolding. 

On the other hand, analyses using intuitive geometric order parameters have been developed to un- 
derstand folding and are now commonly used. These include the fraction of native contacts Q 
28 . 36. 142I? ), which can be locally or globally defined, root mean square distance or deviation (RMSD) 
between structures (23, 26|, 51? ), structural overlap parameter X (S, 14? ), Debye- Waller factors (jiil. I50I) . 
or fraction of correct Dihedral angles (0). 

To find a simple geometrical order parameter that quantifies progress to the folded structure poses 
several challenges. These include an accurate account of the effects of polymer non-crossing (j33l ). energetic 



and entropic heterogeneity in native driving forces (|3ll . l4ll . |42j), as well as non-native frustration and 



trapping 



161 . |39j, 1471 ) . Fortunately it has been borne out experimentally that wild type proteins are 
sufficiently minimally frustrated that non-native interactions do not play a strong role in either folding 
rate or mechanism, and native structure based models for folding rates and mechanisms have enjoyed 
considerable success 

In condensed matter systems, useful order parameters have historically had intuitive geometrical in- 
terpretations. Their definition did not require the knowledge of a particular Hamiltonian (although their 
temperature-dependence and time-evolution were affected by the energy function in the system). In 
chemical reactions, the distance between constituents in reactant and product has played a ubiquitous 
role in the construction of potential energy surfaces (|3o1). Moreover from the point of view of stochastic 
escape and recombination, the distance perfectly correlates with the commitment probability for a freely 
diffusing particle between two absorbing boundaries. 
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Distance as an order parameter 

The distance is easy to define for a point particle, which we imagine to travel between two locations A 
at ta and B at rg. It is the variational minimum of the functional: 

ds = / dtVr* (1) 



r A JO 



where r = dr/dt, and the initial and final conditions, or equivalently boundary conditions, are r*(0) = ta 
and r*(T) = r B . 



However until recently (|33j, [3J, |40j) the distance had not been formulated for higher dimensional objects 
such as pairs of polymer configurations, despite close parallels in string theory (? ). 

In this paper after briefly reviewing two common reaction coordinates, Q and RMSD, and the two 



newer ones introduced and explored in (]33l . l34l. l40l). T> and Mean root squared distance (MRSD), we will 



further explore structural alignments based on D for idealized hairpins. 
Some problems with commonly used reaction coordinates 

Many reaction coordinates have been used to describe the folding process, while still being flawed in 
principle. These characterizations have been largely successful because the majority of conformations 
during folding are well characterized by changes in these parameters: Proteins undergo some collapse 
concurrently with folding, lower their internal energy, and adopt structures geometrically similar to the 
native structure. 

Nevertheless it is easy to point to simple examples of conformational transitions for which the adoption 
of native structure does not correlate with the change in commonly used order parameters. While these 
conformational pairs may not be wholly representative of the total folding process, they point to situations 
where folding to a given structure would not be well-characterized by commonly used order parameters. 

Figure Q] shows two structures A and B with different measures of structural similarity to a "native" 
hairpin fragment N . These structures have different measures of proximity depending on the coordinate 
used to characterize them. If we use the fraction of native contacts Q to describe native proximity EL 
structure A has a Q of Q A = 1/3 while Q B = 0, so by this measure it is more native. If we use the root 
mean square deviation RMSD [J, structure B is more native-like than A. Moreover, structure B would 
have a higher probability of folding before unfolding than A, i.e. it has a larger value of Pfold (0), 
and so is closer kinematically to the native structure. The longer the hairpin, the more likely a slightly 
expanded structure is to fold, so the discrepancy between Q and RMSD for these pairs of structures 
becomes even larger. 

In contrast to RMSD, Q also does not distinguish between chiralities. Typically the energy function 
forbids opposite chiralities, however if the appropriate chirality is not enforced in the backbone dihedral 
potentials, mirror-image structures as in figure [2] will be allowable, and are indistinguishable according 



to Q ((3J) 



While the RMSD is often characterized as a "distance" between structures, it is not equivalent or even 
proportional to the sum of the straight-line distances between the atoms or residues in the two structures 
(figure (3]). This quantity is in fact given by the mean root squared distance (MRSD), defined for two 



^2 i< j Ay A^J I (^2 i<: j ^Hjj counts pairs of residues with some cutoff distance in both structure A and structure 



N. This result is then normalized by the number of contacts in the native structure. 



— rsj 2 is a least-squares measure of similarity between structures A and B. Typically this 
quantity is minimized given two structures and so can be thought of as a "least squares fit" . The sum may be over all atoms, 
or simply all residues in coarse-grained models. 
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structures A and B as: 

I * i w 

71 = 1 71=1 

The RMSD between two structures is always greater than or equal to the MRSD between the same 
structures, with MRSD = RMSD in only the most trivial cases (1341 ) . The RMSD is also less robust to 
large fluctuations of select residues in structural pairs (0). 

MRSD has a simple intuitive physical meaning- the MRSD between two structures gives the average 
distance each residue in one structure would have to travel on a straight line to get to its counterpart in 
the other structure (fig E|) . 



Polymer non-crossing in protein folding 

The above interpretation of MRSD points to a shortcoming of both MRSD and RMSD, which is the 
importance of chain non-crossing constraints. Consider the two curves depicted in fig [H which differ 
by having opposite sense of underpass /overpass. When both curves are aligned by minimizing MRSD 
or RMSD, the respective values are almost zero. However the physically relevant distance for one 
conformation to transform to the other is much larger, and must involves one arm of the backbone 
circumventing the other as it moves between conformations. The transformation which minimizes the 
distance has been shown previously to involve motions wherein one end of the polymer doubles back 
upon itself until it reaches the underpass /overpass, where it appropriately crosses under/over it, and 
then proceeds snake- like to extend itself to the final position ( 33l. l40l) . We will not deal further with the 
aspects of non-crossing in this paper. 



The generalized distance V 

The distance between two points can be cast as a variational problem, where the arclength of the curve 
between two points is minimized (equation Q3 see fig [5]). The resultant Euler-Lagrange equations for the 
distance between two points are: 

d ,dC, 

or 

v = 

which means straight line motion, since this means that the direction of the velocity does not change. 

As mentioned in the introduction, the notion of distance between two points can be generalized to 
two curves or higher-dimensional objects in general (|4ol). As in the case of points, the distance between 
two curves can be thought as a variational problem, where one now minimizes the cumulative integrated 
arclength between the two space curves: 

V[r] = [ ds [ dty/iZ. (4) 



JO JO 

Here r = r(s,t) = (x(s,t),y(s,t), z(s,t)) and f = dr/dt. The independent variables in this formulation 
are position along the contour of the polymer s and elapsed "time" during the transformation t. 

Intuitively, the double integral in eq. measures how much every part of the polymer moves in going 
from one configuration to another (see fig [6] for a schematic). 

The minimal distance problem eq. ([!]) is not equivalent to a simple soap-film problem (see figEJ). It also 
has a lower symmetry than the relativistic world-sheet of a classical string (j4ol ) , and so is inequivalent to 
that problem as well. 
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Minimizing equation @ results in straight line motion of all points along the curve. This is because 
equation models not an inextensible string but an effective "rubber band" which can expand and 
contract at no cost to facilitate the minimal-distance transformation. If the polymer cannot arbitrarily 
stretch and contract (a good approximation for real polymers), the trajectories of the constituent segments 
deviate from straight lines. 

The polymer is made inextensible by introducing the constraint 

= Vr* = 1 , (5) 

whereupon the function to be minimized becomes 

T>= [ [ dsdt £(r,r') (6) 
Jo Jo 

with effective Lagrangian: 

C{s,t) = Vi- \{Vr~V- 1) . (7) 

and Lagrange multiplier A = A(s,t), a function of both s and t. 

The new equations of motion obtained by extremizing the functional become: 

ir = Xk + X'i (8) 



where t is the unit tangent vector and k is the curvature vector (|40l ). 

Numerical solutions may be more readily obtained by discretizing the string as shown in figure This 
procedure is a particular example of the method of lines, used to obtain solutions of partial differential 
equations. After discretization, the functional to be minimized becomes 

Vfoii] = [ dtC{vi,Vi), (9) 
Jo 

where the effective Lagrangian C is: 

EfV^-^ 1 ((ri+i-r^-6 2 )). (10) 

i=l ^ ' 

Here b is the segment length which we set to unity. The distances we obtain are thus in units of b 2 . 
The distance between space curves has the dimensions of area just as the distance between points has 
dimensions of length. Upon discretization the PDE of the system becomes a set of N coupled ODE's, 
one for each residue: 

vi + Ai 2 r 2 /i = (11a) 
v 2 - Ai 2 r 2 /i + A 23 r 3/2 = (lib) 

% + A N -i,Nr N /(N-i) = . (11c) 

The solutions of the first and last (iVth) residues or beads consist of either straight-line motion of the 
bead, pure rotation of the link terminating on the bead, or a stationary solution where the residue remains 
at rest. Moreover, Weierstrass-Erdmann corner conditions or transversality conditions demand smooth 
curves for solutions by disallowing discontinuities or cusps in the trajectories (|34l). 

Given two conformations which serve as boundary conditions on the equations of motion (|llaHllc|) . 
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several solutions yielding slightly (non-extensively) different P's can be constructed. It can be shown 
that they are all local minima (|34l ). In figure [9] two solutions are shown. Figure [9]A. depicts the global 
minimum transformation, and figure [9)3 a sub-minimal "excited-state" transformation. The solutions 
both involve either rotations of the constituent links or straight line motion of the constituent beads. In 
figure [9]A., rotation occurs away from the straight-line conformation and results in a distance T> = 45.793, 
while in[9j3 rotation occurs from the curved conformation and results in T> = 46.278. 

The fact that a real polymer cannot cross itself can be incorporated into the problem of finding the 



minimal distance (J33j). Non-crossing is manifested as an inequality constraint ()10|, l24l . |25I . l48l ). which 
appears in equation (Ti~Qj) as a Lagrange parameter for each residue i, multiplying the excluded volume 
constraint. To describe this, let the unit vector from the kth to the (k + l)th bead be e k = (*k+i — v^j/b, 
then the vector to position r(s) at contour length s on the chain (see e.g. figEJ) is 

fe-i 

r(s) = b^ej + (s - kb)e k 
i=0 

= r k + (s - kb)e k ■ 

To constrain the motion of the beads so that the chain cannot cross itself, we add the term 

li(Jd8\T{8)-Ti\+4\ (12) 



to the summand of equation (110p . Note that by discretizing the problem to find the motion of residues, 
there must be an asymmetry in the way that the chain is treated- in a continuum treatment the term in 
the integrand of (112j) would be |r(s) — r(s')|. The quantity e 2 in (112j) is an "excess parameter" which is 
zero unless a residue is directly constrained (touching some part on the rest of the chain). If ef = the 
problem of finding minimal distance is a "free" problem for residue i, and the equations of motion (|llal 
lllcp are unchanged. However the corner conditions mentioned above induce an implicit "knowledge" of 
the sterically avoided boundary, so that the motion of the residues are altered to travel most directly to 
the steric surface constituting the constraint or obstacle. At this point the residue is constrained to be 
on the surface of the obstacle and the trajectory is defined accordingly. Subsequently the residue leaves 
the constraining surface and the problem becomes a free problem once again, travelling most directly to 
the final conformation (j33l ). 

In the above treatment the chain has zero thickness. A tube thickness p can be straightforwardly 
incorporated into the treatment by letting r(s) — ► r(s) + pe p in equation (|12p . and then integrating over 
the surface of the cylinders which compose the resulting piece-wise tube. 

Another modification that can be made to the Lagrangian is one involving the curvature constraints. 
In the current treatment the angle between to consecutive links of the chain can have any value, whereas 
in real protein chains angles defined by bonds between atoms or residues are restricted. We will not 
discuss these aspects in this manuscript. 



The minimal distance between protein fragments 

In ref. (j33|) protein fragments such as an alpha helix and beta hairpin were considered for purposes 
of calculating the minimal distance. An extended strand was aligned to the respective structures by 
minimizing either RMSD or MRSD, and the distance T> was subsequently calculated for the aligned 
structural pairs. Both real and idealized protein fragments were considered. Most pairs of structures 
had smaller distance minimal pathways when aligned using MRSD as the cost function. In some cases 
however the smaller distance minimal pathway was obtained when the boundary conformations were 
aligned using RMSD as the cost function. 

For example, the straight line conformation in figure [TU1 was aligned to an idealized /3-hairpin structure 
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also shown in that figure. The alignment was performed by both minimizing the MRSD between the 
structures (figure [TUlA.) , and by minimizing RMSD between the structures (figure [TUB). In each instance, 
the minimal distance T> between the structural pairs was calculated after alignment. The resulting aligned 
straight-line structures have significantly different position/orientation depending which cost function was 
used, MRSD or RMSD: the MRSD between the two staight-line structures is in fact larger than the MRSD 
between each and the hairpin structure (j33l ). 

Both transformations are minimal transformations but are subject to different boundary conditions 
and thus yield different pathways and P's. The question remains as to how to align the structures to 
obtain the minimum of all minimal transformations, i.e. the minimum minimal distance T>. To calculate 
this quantity, T> itself must be used as the cost function for alignment!! 

In this paper, we align structures using D as a cost function to obtain for the first time the minimum 
of all minimal transformations. The structures that we consider are idealized straight-line segments with 
varying number of links, which are then aligned to idealized beta hairpins using T> as a cost function. 
The alignment and resulting distance T> are compared with the alignments and distances of RMSD and 
MRSD. This is a first step toward aligning more complex structures using I? as a cost function. We will 
also see that there exist high order approximations which capture much of the properties of a true T> 
alignment. Applying these approximate metrics to align structures such as a full protein is a topic for 
future research. 

Structural alignment of protein fragments using the distance D 

In principle, minimal pathways can be computed for any initial and final configurations, just as RMSD 
can be computed between any two configurations. However it of special significance to anneal the config- 
urations allowing translations and rotations, until the minimal distance transformation is achieved (i.e. 
the minimum of minimal distance transformations). This is analogous to the usual procedure of using 
RMSD or MRSD as a cost function between two structures and minimizing with respect to translations 
and rotations. While the minimization procedure is particularly straightforward for RMSD and involves 
the inversion of a matrix, the minimization using the distance T> as a cost function involves a simplex or 
conjugate gradient minimization and so is more computationally intensive. 

In short the boundary conformations are allowed to translate and rotate in 3D space. Their position 
and orientation is modified to produce a pathway with minimal length, as compared to all other minimal 
pathways that can be obtained by positioning and orienting the same two structures in 3D space. 

"^In the limit of a large number of residues (N), the distance converges to the N times the MRSD: D — > TV x MRSD, 
so for long chains MRSD can be considered a first step towards optimal alignment. But ideally one wants to align the two 
structures using T> itself as a cost function. 
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Method and Results 

For the purpose of generating accurate initial guesses for the minimal distance aligned structure, we 
introduce the following hierarchy: 

V = N x MRSD . . . (13a) 

2>l = I> (4 A U (B) ) /\/\^ (13b) 

i=i 




(13c) 



(13d) 



In this hierarchy, the T> a have the following interpretation: T>o is the cumulative distance between the 
sets of points comprising the residue locations of conformations A and B, T>\ is the cumulative distance 
between the sets of single links, £{, comprising configurations A and B, T>2 is the cumulative distance 
between the sets of double links, {£i}, comprising configurations A and B plus any single-link remainder 
if one exists, and so on. That is, at level a the polymer chain is divided up into sub-segments each 
of link-length a, plus one segment constituting the remainder. When a = N, the chain as a whole is 
considered, which is the true distance T>. This procedure is also illustrated schematically adjacent to each 
equation above. 

We observed that T>\ was a good approximation to the total T> between two chains, was much easier in 
practice to calculate, and could be automated in a robust way. For these reasons we used it to generate 
initial guesses for minimal distance aligned structures. After the initial alignment using T>\ the chains 
were further aligned using the full distance T>. At this stage the general form of the transformation is 
established and the computation can be automated. We used a Nelder-Mead simplex method in our 
algorithm to find the minimal distance alignment. 

Figure [TT1 shows the aligned structures using RMSD, MRSD, T>\, and T>, for increasing numbers of 
links. Several points can be observed. For the smallest number of links (3), MRSD, T>\, and T> all give 
the same alignment (fig Ilia). For 5 or more links, the MRS -D-aligned structure breaks symmetry by 
choosing particular diagonal direction, while T>\ and T> retain this symmetry but begin to differ (figlllb). 
The deviation from MRSD and T> is a finite-size effect (|4ol). so we know that the two alignments must 
eventually converge as N is increased. At 9 links (fig lllti). the 2?i-alignment breaks symmetry in the 
same fashion as MRSD, yet the P-alignment remains similar to RMSD. By 11 links (fig llle). the T>- 
aligned structure has broken symmetry as well, however with a smaller angle to the horizontal than 
either MRSD or T>\. As N is increased, T>\ and MRSD aligned structures quickly converge, while the 
angle with respect to the horizontal of the P-aligned structure continues to lag behind that of either 
MRSD and T>\ structures, converging slowly as N continues to increase (figures QjJ-j). The RMSD- 
aligned structure remains horizontal throughout. 

Average lengths of /3-hairpins in databases constructed from the PDB are about 17 residues (? ), most 
consistent with fig lllh . From this figure we see that hairpins of this length have a globally different 
structural alignment with extended structures depending on whether T> or RMSD is used. 

Table Q] and figure [12] summarize the results for the minimal distance transformations from the aligned 
structures. Table [1] gives the numerical value of the distance T> for each aligned structure, aligned using 
the various cost functions listed: T>, T>\, MRSD, and RMSD. Note that the distance T> is always 
minimized for the distance-aligned structure, and tends to increase as one considers the V\, MRSD and 
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then RM S D -aligned structures for a given number of links. 

For comparison, in table [2] the corresponding values of MRSD are given for the aligned structures using 
each cost function. Note in each table that as N — > oo, T> tends to converge to MRSD. 

The distance travelled per residue, in units of link length is T>/Nb. Dividing this measure by the chain 
length (N — 1)6 gives a scale-invariant measure of the distance: T> = T>/(N(N — l)b 2 ). This quantity 
is plotted in figure [12j We can see from the plot that the -Di-aligned structure generally gives a good 
approximation to the true P-aligned structure. Moreover, MRSD, T>\ and T> all converge to the same 
while RMSD converges to a dissimilar value. 

Conclusion and Discussion 

In this paper, we reviewed the concept of the generalized distance T>, and then used it as a cost function to 
align unfolded idealized strands of various sizes to their corresponding idealized /3-hairpin structures. This 
is the first time that the true Euclidean distance has been used as a cost function for structural alignment. 
The distance T> for the minimal transformation between aligned structural pairs was compared for various 
alignment cost functions: RMSD, MRSD, T>\, and T> itself. T>\ is the distance between conformational 
pairs if the chain were decimated to single links and distance of all single-link transformations was 
summed. 

We found that Pi-aligned structures generally gave a distance that was close to the true P-aligned 
structure, and in this sense was a good approximation. However the aligned structures were noticeably 
different depending on the cost function, for the finite values of N that we studied. Our largest value of 
N was 22 residues, while the average length of /3-hairpins is about 17 residues. For these average hairpin 
lengths, the minimal T> aligned structure is globally different from the RMSD structure. Whether this 
discrepancy is generally true for larger structures or whole proteins remains to be determined, but we 
feel it is likely. It is not yet clear at this point whether alignment using distance will yield more accurate 
predictions for such problems as protein structure prediction or ab-initio drug design. What is clear is 
that the best-aligned structures using a reasonable alignment metric such as the true distance give very 
different results than RMSD, even for relatively simple structures such as the beta-hairpin. 
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Tables 



Table 1: T>/N (in units of link length squared) between the aligned structures in figure [TT] Each of the 4 
columns represents the structural pairs for the cost function labelled. For example, column 3 gives T)/N 
for structural pairs in figure QT] aligned using MRSD. 





Alignment cost function 


N 


V 


v 1 


MRSD 


RMSD 


4 


0.785 


0.785 


0.785 


0.822 


6 


1.391 


1.415 


1.473 


1.419 


8 


1.974 


1.983 


2.085 


2.014 


10 


2.559 


2.574 


2.654 


2.615 


12 


3.127 


3.158 


3.197 


3.216 


14 


3.674 


3.705 


3.726 


3.817 


16 


4.207 


4.235 


4.247 


4.418 


18 


4.732 


4.769 


4.762 


5.019 


20 


5.252 


5.294 


5.272 


5.620 


22 


5.767 


5.802 


5.783 


6.221 
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Table 2: MRSD (in units of link length) between the aligned structures in figure [TT] using the four cost 
functions we considered. For example, column 1 gives MRSD for structural pairs in figure [11] aligned 
using the distance T>. 





Alignment cost function 


N 


V 


T>i 


MRSD 


RMSD 


4 


0.707 


0.707 


0.707 


0.809 


6 


1.375 


1.393 


1.337 


1.412 


8 


1.961 


1.960 


1.899 


2.008 


10 


2.547 


2.545 


2.436 


2.610 


12 


3.062 


3.108 


2.959 


3.211 


14 


3.575 


3.675 


3.475 


3.813 


16 


4.081 


4.004 


3.987 


4.414 


18 


4.585 


4.506 


4.495 


5.015 


20 


5.088 


5.008 


5.002 


5.616 


22 


5.591 


5.511 


5.508 


6.218 
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Figures 



Figure 1: Order parameters do not always correlate with kinetic proximity. Structure A above is more 
native-like according to the fraction of native contacts, while structure B is more native- like according 
to RAISD, and is also closer kinetically to the native structure. 



Figure 2: Native structure of SH3 (right) and its mirror image. Although dissimilar by RMSD, biologically 
nonfunctional, and disallowed by true dihedral potentials, this structure has a Q = 1, because native 
contacts remain intact after mirroring transformations. 
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Figure 3: The MRSD is the average length of the black like segments between corresponding residues of 
the initial and final configuration. 



Figure 4: The MRSD and RMSD between the two curves are close to zero (the curves in this figure are 
displaced for better viewing but should be imagined to be superposed). But because the curve cannot pass 
through itself, in order to undergo the transformation one leg must undergo relatively large amplitude 
motions to travel from one conformation to another. This results in a non-zero distance between the 
conformations by accurate metrics which can account for non-crossing. 

A B 




1 >■ 

Figure 5: Distance between the two points A and B is the minimum length of the the curve connecting 
the two points. 




Figure 6: The distance T>ab is the accumulation of how much every part of the contour defining the space 
curve moves in the transformation between two conformations A and B. 



B 



Figure 7: The line segment A is displaced by d along itself, to B. The soap film area A soap between the 
two segments is 0. But the distance T>ab = Ld 
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Figure 8: The lower curve is a discretized version of the upper one. After discretization the PDE for the 
upper curve becomes a set of N coupled ODE's for the iV residues in the lower chain (A sample residue 
is marked with a circle). 




Figure 9: Minimal and sub-minimal transformations between a straight line and a quarter circle (see text 
for description). For the left transformation T> = 45.793 and for the right one T> = 46.278 




Figure 10: (color) T> minimizing transformations for MRSD aligned (yellow) and RMSD (cyan) aligned 
hairpins. Intermediate state is shown in grey. The distances for each transformation, in units of link 
length squared, are 3.20 for MRSD-aligned and 3.22 for RMSD-aligned structures. 




Figure 11: (color) Alignments with different cost functions. The Hairpin is shown in red. D alignment 
in green, T>\ in blue, MRSD in yellow, and RMSD in cyan 
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Figure 12: Scale invariant distance resulting from different alignments with different cost functions 



